Inference Optimization, VRAM Calculation, Performance Tuning, Resource Management

My First Multi-GPU Kernel: Writing All-to-All for AMD MI300X
gau-nernst.github.io·9h·
Discuss: Hacker News
LLM Optimization
Flag this post
Relation-Aware Bayesian Optimization of DBMS Configurations Guided by Affinity Scores
arxiv.org·5h
LLM Optimization
Flag this post
Gated DeltaNet (Linear Attention variant in Qwen3-Next and Kimi Linear)
sebastianraschka.com·6h·
Discuss: r/LLM
LLM Optimization
Flag this post
Unlocking AI Potential: Squeezing Giant Models into Tiny Spaces
dev.to·11h·
Discuss: DEV
LLM Optimization
Flag this post
How fast can an LLM go?
fergusfinn.com·3d·
Discuss: Hacker News
LLM Optimization
Flag this post
A hitchhiker's guide to CUDA programming
seanzhang.me·3d·
Discuss: Hacker News
LLM Optimization
Flag this post
Where to Buy or Rent GPUs for LLM Inference: The 2026 GPU Procurement Guide
bentoml.com·2d·
Discuss: Hacker News
LLM Optimization
Flag this post
Kimi Linear: An Expressive, Efficient Attention Architecture
arxiviq.substack.com·1d·
Discuss: Substack
LLM Optimization
Flag this post
From Classical Models to AI: Forecasting Humidity for Energy and Water Efficiency in Data Centers
towardsdatascience.com·20h
LLM Optimization
Flag this post
How to access and use Minimax M2 API
dev.to·4h·
Discuss: DEV
LLM Optimization
Flag this post
Machine Scheduler in LLVM – Part II
myhsu.xyz·1d·
LLM Optimization
Flag this post
ZkML Breakthrough: 13B Models Verified in 15 Minutes
lightcapai.medium.com·18h·
Discuss: Hacker News
LLM Optimization
Flag this post
Dynamic Resource Allocation in CXL-Enabled Heterogeneous Compute Clusters
dev.to·1d·
Discuss: DEV
LLM Optimization
Flag this post
MobileNetV3 Paper Walkthrough: The Tiny Giant Getting Even Smarter
towardsdatascience.com·21h
LLM Optimization
Flag this post
GPU Pro – Master Your AI Workflow
github.com·14h·
🛠️Developer Tools
Flag this post
Generation at the Speed of Thought: Speculative Decoding
bittere.substack.com·22h·
Discuss: Substack
LLM Optimization
Flag this post
The Evolution of GPUs: How Floating-Point Changed Computing
dell.com·20h·
Discuss: Hacker News
💻Tech
Flag this post
Assessing DRAM Data Retention via Quantum-Tunneling Lifetime Mapping
dev.to·1h·
Discuss: DEV
LLM Optimization
Flag this post
Scalable In-Memory Associative Processing for Graph Neural Network Inference
dev.to·21h·
Discuss: DEV
LLM Optimization
Flag this post